feat: add Amazon Textract integration (#2391) by zafatar · Pull Request #3148 · deepset-ai/haystack-core-integrations

zafatar · 2026-04-13T12:37:24Z

Add AmazonTextractConverter component that extracts text from images and single-page PDFs using the AWS Textract synchronous API. Supports both DetectDocumentText (plain OCR) and AnalyzeDocument (tables, forms, signatures, layout) as well as natural-language queries. Includes CI workflow, unit/integration tests, pydoc config, and repo-level wiring (labeler, coverage comment, README).

Related Issues

partially addresses Add support for AWS textract #2391

Proposed Changes:

Similar to the other converter tools such as Azure Document Intelligence or other Amazon resources such as Amazon Bedrock, it covers the access to the Amazon Textract by using boto3 and AWS credentials from the environment variables.

How did you test it?

The tests are run as two separate groups:

cd ./integrations/amazon_textract
hatch run test:unit
hatch run test:integration

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

Add AmazonTextractConverter component that extracts text from images and single-page PDFs using the AWS Textract synchronous API. Supports both DetectDocumentText (plain OCR) and AnalyzeDocument (tables, forms, signatures, layout) as well as natural-language queries. Includes CI workflow, unit/integration tests, pydoc config, and repo-level wiring (labeler, coverage comment, README).

CLAassistant · 2026-04-13T12:44:38Z

All committers have signed the CLA.

bogdankostic

Thanks for your PR @zafatar! It looks already good to me, I just left a few minor comments on how it can be further improved.

bogdankostic · 2026-04-14T12:21:20Z

...amazon_textract/src/haystack_integrations/components/converters/amazon_textract/converter.py

+            When provided, the Textract ``QUERIES`` feature type is enabled
+            automatically and each question is sent as a query. Answers are
+            included in the raw Textract response. Example:
+            ``["What is the patient name?", "What is the total due?"]``


Let's use single backticks here.

Suggested change

``["What is the patient name?", "What is the total due?"]``

`["What is the patient name?", "What is the total due?"]`

bogdankostic · 2026-04-14T13:46:40Z

integrations/amazon_textract/CHANGELOG.md

This file will be generated automatically once we do the release, so we can remove it here.

bogdankostic · 2026-04-14T13:52:30Z

integrations/amazon_textract/tests/test_amazon_textract_converter.py

Let's add some checks that warning messages are raised for cases that go wrong.

zafatar requested a review from a team as a code owner April 13, 2026 12:37

zafatar requested review from bogdankostic and removed request for a team April 13, 2026 12:37

github-actions bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 13, 2026

feat: add Amazon Textract examples

543e1a6

zafatar and others added 5 commits April 13, 2026 14:53

fix: linting issue with the github workflow file

9563b7c

fix: typo causing faillure of api-reference-build in CI

ef75627

fix: update naming inconvention

abe2ca4

fix: remove redundant imports and errors

1c4865d

Merge branch 'main' into main

7df0b14

bogdankostic requested changes Apr 14, 2026

View reviewed changes

bogdankostic added the integration:amazon-textract label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Amazon Textract integration (#2391)#3148

feat: add Amazon Textract integration (#2391)#3148
zafatar wants to merge 7 commits intodeepset-ai:mainfrom
zafatar:main

zafatar commented Apr 13, 2026 •

edited by bogdankostic

Loading

Uh oh!

CLAassistant commented Apr 13, 2026 •

edited

Loading

Uh oh!

bogdankostic left a comment

Uh oh!

bogdankostic Apr 14, 2026

Uh oh!

bogdankostic Apr 14, 2026

Uh oh!

bogdankostic Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	``["What is the patient name?", "What is the total due?"]``
	`["What is the patient name?", "What is the total due?"]`

Conversation

zafatar commented Apr 13, 2026 • edited by bogdankostic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

CLAassistant commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bogdankostic left a comment

Choose a reason for hiding this comment

Uh oh!

bogdankostic Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

bogdankostic Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

bogdankostic Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zafatar commented Apr 13, 2026 •

edited by bogdankostic

Loading

CLAassistant commented Apr 13, 2026 •

edited

Loading